Goto

Collaborating Authors

 iteration 100


9e9f0ffc3d836836ca96cbf8fe14b105-Supplemental-Conference.pdf

Neural Information Processing Systems

Inanutshell, features ofthis dataset are sampled randomly fromN(0,1), and the target is produced by an ensemble of randomly constructed decision trees applied to the sampledfeatures. Our dataset has10,000 objects, 8 features and the target was produced by16 decision trees of depth6. CatBoost is trained with the default hyperparameters. Importantly,thelattermeans that this approach is not covered by the embedding framework described in subsection 3.1. So, it seems to be important to embed each feature separately as describedinsubsection3.1.



SG-XDEAT: Sparsity-Guided Cross-Dimensional and Cross-Encoding Attention with Target-Aware Conditioning in Tabular Learning

Cheng, Chih-Chuan, Tseng, Yi-Ju

arXiv.org Artificial Intelligence

We propose SG-XDEAT (Sparsity-Guided Cross Dimensional and Cross-Encoding Attention with Target Aware Conditioning), a novel framework designed for supervised learning on tabular data. At its core, SG-XDEAT employs a dual-stream encoder that decomposes each input feature into two parallel representations: a raw value stream and a target-conditioned (label-aware) stream. These dual representations are then propagated through a hierarchical stack of attention-based modules. SG-XDEAT integrates three key components: (i) Cross-Dimensional self-attention, which captures intra-view dependencies among features within each stream; (ii) Cross-Encoding self-attention, which enables bidirectional interaction between raw and target-aware representations; and (iii) an Adaptive Sparse Self-Attention (ASSA) mechanism, which dynamically suppresses low-utility tokens by driving their attention weights toward zero--thereby mitigating the impact of noise. Empirical results on multiple public benchmarks show consistent gains over strong baselines, confirming that jointly modeling raw and target-aware views--while adaptively filtering noise--yields a more robust deep tabular learner.


Supplementary material A with for numerical features

Neural Information Processing Systems

We provide visual explanation of how embeddings are passed to MLP in Figure 2 and Figure 3. Also, We provide visualisation of target-aware PLE (subsubsection 3.2.2) in Figure 4. Figure 4: Obtaining bins for PLE from decision trees. We used the following datasets: Gesture Phase Prediction (Madeo et al. [27]) Churn Modeling We follow the pointwise approach to learning-to-rank and treat this ranking problem as a regression problem. In this section, we apply the quantile-based piecewise linear encoding (described in subsubsec-tion 3.2.1 to MLP and Transformer on the synthetic GBDT -friendly dataset described in section 5.1 The results are visualized in Figure 5. In this section, we test Fourier features implemented exactly as in Tancik et al. We mostly follow Gorishniy et al. [13] in terms of the tuning, training and evaluation protocols.


Supplementary material

Neural Information Processing Systems

All the experiments were conducted under the same conditions in terms of software versions. The feature preprocessing for DL models is described in the main text. The preprocessing is then applied to original features. The remaining notation follows those from the main text. For most experiments, training times can be found in the source code.


Practically Solving LPN in High Noise Regimes Faster Using Neural Networks

Jiang, Haozhe, Wen, Kaiyue, Chen, Yilei

arXiv.org Artificial Intelligence

We conduct a systematic study of solving the learning parity with noise problem (LPN) using neural networks. Our main contribution is designing families of two-layer neural networks that practically outperform classical algorithms in high-noise, low-dimension regimes. We consider three settings where the numbers of LPN samples are abundant, very limited, and in between. In each setting we provide neural network models that solve LPN as fast as possible. For some settings we are also able to provide theories that explain the rationale of the design of our models. Comparing with the previous experiments of Esser, Kubler, and May (CRYPTO 2017), for dimension $n = 26$, noise rate $\tau = 0.498$, the ''Guess-then-Gaussian-elimination'' algorithm takes 3.12 days on 64 CPU cores, whereas our neural network algorithm takes 66 minutes on 8 GPUs. Our algorithm can also be plugged into the hybrid algorithms for solving middle or large dimension LPN instances.